Information retrieval in hydrochemical data using the latent semantic indexing approach

نویسندگان

  • Petr Praus
  • Pavel Praks
چکیده

Petr Praus (corresponding author) Department of Analytical Chemistry and Material Testing, VSB-Technical University Ostrava, 17 listopadu 15, 708 33 Ostrava, Czech Republic Tel.:+420 59 732 3370 Fax: 420 59 732 3370 E-mail: [email protected] Pavel Praks Department of Mathematics and Descriptive Geometry, Department of Applied Mathematics, VSB-Technical University Ostrava, 17 listopadu 15, 708 33, Ostrava, Czech Republic The latent semantic indexing (LSI) method was applied for the retrieval of similar samples (those samples with a similar composition) in a dataset of groundwater samples. The LSI procedure was based on two steps: (i) reduction of the data dimensionality by principal component analysis (PCA) and (ii) calculation of a similarity between selected samples (queries) and other samples. The similarity measures were expressed as the cosine similarity, the Euclidean and Manhattan distances. Five queries were chosen so as to represent different sampling localities. The original data space of 14 variables measured in 95 samples of groundwater was reduced to the three-dimensional space of the three largest principal components which explained nearly 80% of the total variance. The five most proximity samples to each query were evaluated. The LSI outputs were compared with the retrievals in the orthogonal system of all variables transformed by PCA and in the system of standardized original variables. Most of these retrievals did not agree with the LSI ones, most likely because both systems contained the interfering data noise which was not preliminary removed by the dimensionality reduction. Therefore the LSI approach based on the noise filtration was considered to be a promising strategy for information retrieval in real hydrochemical data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Retrieval: Content versus Context

In this paper, we introduce a new approach to image retrieval. This new approach takes the best from two worlds, combines image features (content) and words from collateral text (context) into one semantic space. Our approach uses Latent Semantic Indexing, a method that uses co-occurrence statistics to uncover hidden semantics. This paper shows how this method, that has proven successful in bot...

متن کامل

Probabilistic Latent Semantic Indexing Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval

Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain{speci c synonymy as well as with polysemous words. In contrast ...

متن کامل

Latent Semantic Indexing using Multiresolution Analysis

Latent semantic indexing (LSI) is commonly used to match queries to documents in information retrieval (IR) applications. It has been shown to improve the retrieval performance, as it can deal with synonymy and polysemy problems. This paper proposes a hybrid approach which can improve result accuracy significantly. Evaluation of the approach based on using the Haar wavelet transform (HWT) as a ...

متن کامل

Semantic Retrieval Using Ontology and Document Refinement

To enhance the retrieval accuracy of information search engine, this paper proposes a information retrieval system based on semantics and document refinement that realized by employing the semantic description and relevance of ontology to the information system. We describe the using of LSI (latent semantic indexing) approach to replace the traditional VSM (vector-space model) approach in detai...

متن کامل

Semantic Indexing Using WordNet Senses

We describe in this paper a boolean Information l~.etrieval system that adds word semantics to the classic word based indexing. Two of the main tasks of our system, namely the indexing and retrieval components, are using a combined wordbased and sense-based approach. The key to our system is a methodology for building semantic representations of open text, at word and collocation level. This ne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007